利用S&P500 成分股自製投資組合
做移動窗格法最佳化回測
本研究旨在透過S&P500成分股觀察,選取前四大產業類別之個股
自製投資組合並利用移動窗格最佳化檢驗投資組合是否具有更勝S&P500指數之表現
過程中利用R語言完成網路爬蟲、資料前處理與時間序列格式化、資料視覺化、移動窗格最佳化回測
希望可以透過此篇研究讓金融資料數據化展現
將透過S&P500的產業分析挑選前四大產業個別兩支個股,篩選依據為利用2018/06/19~2020/06/19之股價Mean Log Returns以及Stdev Log Returns作為評斷
組成投資組合後進一步利用2020/06/19~2022/06/19之股價進行移動窗格最佳化(Walk Forward Optimization),
本篇研究設定In-Sample-Data週期為4個月(相關移動窗格法最佳化可參考移動窗格法最佳化欄位說明)
最後利用R相關套件去進行此段時間回測
並與大盤以及未最佳化之投組去做績效分析。
一般來說回測有2種方法:
第一種方法:把手上所有的歷史數據都拿來做backtesting,像下面的這張圖一樣。用所有的歷史數據(稱之為In-Sample-Data)來跑backtesting(下圖中灰色區域)。跑完之後得到理想的參數和模型,然後就用來實盤(下圖中紅色區域)。
但是這種回測有個問題:過度擬合。往往回測結果不錯,但是實盤就是不好看。
另一種方式,稱之為Walk forward Backtesting,前進式回測。
即:如果有12個星期的歷史資料可以回測,我們先用第1到第4個星期的數據來跑優化,然後將第5個星期的數據模擬實盤。這時候第1-4個星期的數據就是In-Sample-Data,第5個星期的績效就是Out-Of-Sample的績效。
然後我們測試窗口後挪一個星期,用第2-5個星期數據的回測,用第6個星期的數據模擬實盤。這樣重複8次。最後將所有out-of-sample的結果結合在一起(就是圖下面綠色的部分)。就會是這個策略實際真正可能出現的績效performance。
這種方法因為做了許多次的walk forward
test(這個例子裡面做了7次),所以我們在真實實盤的時候(就等於是第8次的walk
forward test),實盤績效就會特別接近之前7次的模擬實盤績效的結果。
rvest 套件擷取任意的網頁資料,並將大量結構化的資料直接匯入 R 中,
讓資料分析者可以省去手動整理資料的時間。
dplyr() 套件中融入很多概念與結構化查詢語言(Structured Query Language,SQL)相仿的函數。
搭配 %>% 運算子一起使用,能夠讓我們整理資料的能力獲得一個檔次的提升!
可將所有的數據呈現為交互式數據表,可自行選擇每次呈現數量的頻率為何。
以繪圖文法概念為基礎所發展出來的一套 R 繪圖系統,
可繪製各種高品質圖形的 R 繪圖套件,是一套相當受歡迎的繪圖工具。
此套件中的 fct_reorder() 函數可以將因子 f 類別出現的排列順序依照其他變數更動,
可幫助我們直接排序前四大產業。
此套件用來做股票相關分析,可以利用此套件做到快速獲取股價資料,
以及簡易的K線圖繪製。
此套件中的 map() 函數可以將向量資料運用於函數中,並直接返回 list 形式,
map_dbl() 函數可以將向量資料運用於函數中,並直接返回 dbl 形式。
此套件中的 rownames_to_column() 函數可以協助我們整理資料。
此套件中的 str_c() 函數可以將多字串合併為單一字串
plotly 的 R 圖形庫可以製作交互式、互動式的圖形
在此處可以更方便讀者自行操作。
此套件幫助我們設定投資組合限定與目標條件,
並可以自行設定參數實現移動窗格法最佳化。
此套件幫助我們評估投資組合之績效,以及報酬率圖表可視化
讓我們可以有效分析策略是否有效。
利用 R 語言分析 S&P500 的產業分析挑選前四大產業個別兩支個股,
篩選依據為利用2018/06/19~2020/06/19之股價Mean Log Returns以及Stdev Log Returns作為評斷,
組成投資組合後進一步利用2020/06/19~2022/06/19之股價進行移動窗格最佳化(Walk Forward),
最後利用R相關套件去進行此段時間回測並與大盤以及未最佳化之投組去做績效分析,檢驗策略是否有效。
近來股票市場成為全民關注焦點,其中標準普爾500指數,簡稱S&P500,成為投資人都想要打敗的目標,
而自身在交易的過程中,也時常思考有沒有辦法透過分析股票的歷史數據,去挑選出表現以及成長更好的股票,
在挑選這些股票後,有沒有辦法去好好的管理這些股票的比率,以自製更優於市場表現的基金。
分析架構
利用R語言去爬取S&P500的的成分股,並分析成分股的產業組成,
整理資料成需要的形式,並挑選成分股中前四大相關產業,將各產業挑選2支個股去組成投資組合,篩選方法依據
在設定風險條件(sd.log.returns < 0.03)下歷史數據報酬最高的股票
波動度(Variability)最高的股票
並利用R強大的 PortfolioAnalytics 、 PerformanceAnalytics 的投資組合套件,做到 Walk forward optimization (移動窗格法),並在最後以S&P500、移動窗格法產生的最佳化投資組合、和等比例投資組合標的,做出績效的回報,並給予研究結論。
library(rvest)
library(dplyr)
library(DT)
library(ggplot2)
library(forcats)
# Web-scrape SP500 stock list
sp_500 <- read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies") %>%
html_node("table.wikitable") %>%
html_table() %>%
select("Symbol", Security, "GICS Sector", "GICS Sub-Industry") %>%
as_tibble()
# Format names
names(sp_500) <- sp_500 %>%
names() %>%
make.names()
# Show results
datatable(sp_500, options = list(pageLength = 5))
# show in condensed format
sp_500 %>%
lapply(function(x) x %>% unique() %>% length()) %>%
unlist()
## Symbol Security GICS.Sector GICS.Sub.Industry
## 504 504 11 122
#Visualiztion industry in SP500
sp_500 %>%
# Summarise data by frequency
group_by(GICS.Sector) %>%
summarise(count = n()) %>%
# Visualize
ggplot(aes(x = GICS.Sector %>% fct_reorder(count),
y = count
)) +
geom_bar(stat = "identity",fill = c("gray",
"gray",
"gray",
"gray",
"red",
"red",
"red",
"red",
"gray",
"gray",
"gray")) +
geom_text(aes(label = count), size = 3, nudge_y = 4, nudge_x = .1) +
scale_y_continuous(limits = c(0,100)) +
ggtitle(label = "Sector Frequency Among SP500 Stocks") +
xlab(label = "GICS Sector") +
theme(plot.title = element_text(size = 16)) +
coord_flip()
library(quantmod)
library(purrr)
library(tibble)
library(stringr)
library(plotly)
1.Information Technology
2.Industrials
3.Financials
4.Health Care
# 選取前四大相關產業
sp_500_Information_Technology <- sp_500 %>%
filter(GICS.Sector=="Information Technology")
sp_500_Industrials <- sp_500 %>%
filter(GICS.Sector=="Industrials")
sp_500_Financials <- sp_500 %>%
filter(GICS.Sector=="Financials")
sp_500_Health_Care <- sp_500 %>%
filter(GICS.Sector=="Health Care")
# Get Stock Prices Function
get_stock_prices <- function(ticker, return_format = "tibble", ...) {
# Get stock prices
stock_prices_xts <- getSymbols(Symbols = ticker, auto.assign = FALSE, ...)
Time <- index(stock_prices_xts)
# Rename
names(stock_prices_xts) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")
# Return in xts format if tibble is not specified
if (return_format == "tibble") {
stock_prices <- stock_prices_xts %>%
as_tibble() %>%
rownames_to_column(var = "Date") %>%
mutate(Date = Time)
} else {
stock_prices <- stock_prices_xts
}
stock_prices
}
\[R = \frac{ln(\frac{V_f}{V_i})}{t}\\V_i =the\ inital\ value\ of\ the\ investment\\V_f = the\ final\ value\ of\ the\ investment\\t = the\ number\ of\ time\ periods\]
# Get Log Returns Function
get_log_returns <- function(x, return_format = "tibble", period = 'daily', ...) {
Time <- x$Date
# Convert tibble to xts
if (!is.xts(x)) {
x <- xts(x[,-1], order.by = x$Date)
}
# Get log returns
log_returns_xts <- periodReturn(x = x$Adjusted, type = 'log', period = period, ...)
# Rename
names(log_returns_xts) <- "Log.Returns"
# Return in xts format if tibble is not specified
if (return_format == "tibble") {
log_returns <- log_returns_xts %>%
as_tibble() %>%
rownames_to_column(var = "Date") %>%
mutate(Date = Time)
} else {
log_returns <- log_returns_xts
}
log_returns
}
(以下將先以Information Technology Company做示範)
(以2018/06/19~2020/06/19股價表現作為之後篩選依據)
# Mapping the Functions to Information Technology Company
sp_500_Information_Technology <- sp_500_Information_Technology %>%
mutate(
stock.prices = map(Symbol,
function(.x) get_stock_prices(.x,
return_format = "tibble",
from = "2018-06-19",
to = "2020-06-19")
),
log.returns = map(stock.prices,
function(.x) get_log_returns(.x, return_format = "tibble")),
mean.log.returns = map_dbl(log.returns, ~ mean(.$Log.Returns)),
sd.log.returns = map_dbl(log.returns, ~ sd(.$Log.Returns)),
n.trade.days = map_dbl(stock.prices, nrow)
)
sp_500_Information_Technology%>%
select(Symbol, stock.prices:log.returns)
## # A tibble: 74 × 3
## Symbol stock.prices log.returns
## <chr> <list> <list>
## 1 ACN <tibble [504 × 7]> <tibble [504 × 2]>
## 2 ADBE <tibble [504 × 7]> <tibble [504 × 2]>
## 3 ADP <tibble [504 × 7]> <tibble [504 × 2]>
## 4 AKAM <tibble [504 × 7]> <tibble [504 × 2]>
## 5 AMD <tibble [504 × 7]> <tibble [504 × 2]>
## 6 APH <tibble [504 × 7]> <tibble [504 × 2]>
## 7 ADI <tibble [504 × 7]> <tibble [504 × 2]>
## 8 ANSS <tibble [504 × 7]> <tibble [504 × 2]>
## 9 AAPL <tibble [504 × 7]> <tibble [504 × 2]>
## 10 AMAT <tibble [504 × 7]> <tibble [504 × 2]>
## # … with 64 more rows
sp_500_Information_Technology$stock.prices[[1]]
## # A tibble: 504 × 7
## Date Open High Low Close Volume Adjusted
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2018-06-19 162. 163. 161. 163. 2574400 155.
## 2 2018-06-20 163 163. 162. 162. 2026800 153.
## 3 2018-06-21 161. 162. 160. 160. 2034200 151.
## 4 2018-06-22 160. 160. 158. 160. 3623900 151.
## 5 2018-06-25 158. 158. 155. 156. 3134500 148.
## 6 2018-06-26 157. 158. 156. 157. 2164000 148.
## 7 2018-06-27 157. 159. 155. 155. 3191900 147.
## 8 2018-06-28 160. 166. 160. 164. 5801100 156.
## 9 2018-06-29 164. 166. 163. 164. 3370000 155.
## 10 2018-07-02 162. 163. 161. 163. 2279400 154.
## # … with 494 more rows
sp_500_Information_Technology %>%
select(Symbol, mean.log.returns:n.trade.days)
## # A tibble: 74 × 4
## Symbol mean.log.returns sd.log.returns n.trade.days
## <chr> <dbl> <dbl> <dbl>
## 1 ACN 0.000490 0.0195 504
## 2 ADBE 0.00102 0.0246 504
## 3 ADP 0.000264 0.0213 504
## 4 AKAM 0.000415 0.0208 504
## 5 AMD 0.00233 0.0397 504
## 6 APH 0.000216 0.0207 504
## 7 ADI 0.000489 0.0261 504
## 8 ANSS 0.000978 0.0230 504
## 9 AAPL 0.00132 0.0227 504
## 10 AMAT 0.000503 0.0314 504
## # … with 64 more rows
此處圖表為互動式圖表,使用者可以將滑鼠游標放置圖表上,便可查看資料的資訊
plot_ly(data = sp_500_Information_Technology,
type = "scatter",
mode = "markers",
x = ~ sd.log.returns,
y = ~ mean.log.returns,
color = ~ n.trade.days,
colors = "Blues",
size = ~ n.trade.days,
text = ~ str_c("<em>", Security, "</em><br>",
"Ticker: ", Symbol, "<br>",
"Sub Sector: ", GICS.Sub.Industry, "<br>",
"No. of Trading Days: ", n.trade.days),
marker = list(opacity = 0.8,
symbol = 'circle',
sizemode = 'diameter',
sizeref = 4.0,
line = list(width = 2, color = '#FFFFFF'))
) %>%
layout(title = 'S&P500_Information_Technology Analysis: Stock Risk vs Reward',
xaxis = list(title = 'Risk/Variability (StDev Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'Reward/Growth (Mean Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
margin = list(l = 100,
t = 100,
b = 100),
font = list(color = '#FFFFFF'),
paper_bgcolor = 'rgb(0, 0, 0)',
plot_bgcolor = 'rgb(0, 0, 0)')
# METHOD1 : sd.log.returns < 0.03 and Highest mean.log.returns
sp_500_Information_Technology_picked_1 <- sp_500_Information_Technology %>%
filter(mean.log.returns >= 0.001,
sd.log.returns < 0.03) %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(mean.log.returns %>% desc())
sp_500_Information_Technology_picked_1[1,]
## # A tibble: 1 × 4
## Symbol mean.log.returns sd.log.returns n.trade.days
## <chr> <dbl> <dbl> <dbl>
## 1 NOW 0.00155 0.0283 504
# METHOD2 : Highest sd.log.returns
sp_500_Information_Technology_picked_2 <- sp_500_Information_Technology %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(sd.log.returns %>% desc())
sp_500_Information_Technology_picked_2[1,]
## # A tibble: 1 × 4
## Symbol mean.log.returns sd.log.returns n.trade.days
## <chr> <dbl> <dbl> <dbl>
## 1 ENPH 0.00396 0.0554 504
# NOW
as.character(sp_500_Information_Technology_picked_1[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Information_Technology_picked_1[1,1]))
# ENPH
as.character(sp_500_Information_Technology_picked_2[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Information_Technology_picked_2[1,1]))
可以看出我們篩選出的標的在過去的表現都很不錯
成長率高,且趨勢持續向上
最終我們篩選出四大產業的各兩支股票
1.Information Technology : “NOW” “ENPH”
2.Industrials : “GNRC” “CARR”
3.Financials : “MKTX” “LNC”
4.Health Care : “WST” “MRNA”
results <- c("NOW","ENPH","GNRC","CARR","MKTX","LNC","WST","MRNA")
tickers <- results
library(PortfolioAnalytics)
library(PerformanceAnalytics)
portfolioPrices <- NULL
for (ticker in tickers) {
portfolioPrices <- cbind(portfolioPrices,
getSymbols.yahoo(ticker, from = "2020-06-19",to = "2022-06-19",
periodicity = "daily", auto.assign = F)[,4])
}
datatable(portfolioPrices, options = list(pageLength = 5))
-交易成本 : 0.75 %
-單一個股於投組最多比例 : 20 %
-單一個股於投組最少比例 : 5 %
-投資組合單日波動度 : 0.5 %
portfolioReturns <- na.omit(ROC(portfolioPrices))
portf <- portfolio.spec(colnames(portfolioReturns))
portf <- add.constraint(portf, type = 'weight_sum', min_sum = 0.99, max_sum = 1.01)
portf <- add.constraint(portf, type = "transaction_cost", ptc = 0.0075)
portf <- add.constraint(portf, type = 'box', min = .05, max = .20)
portf <- add.objective(portf, type = 'return', name = "mean")
portf <- add.objective(portf, type = 'risk', name = "StdDev", target = 0.005)
set.seed(26736728)
rp <- random_portfolios(portf, 10000, "sample")
opt_rebal <- optimize.portfolio.rebalancing(portfolioReturns,
portf,
optimize_method = "random",
rp = rp,
rebalance_on = "months",
training_period = 1,
rolling_window = 4)
opt_rebal
## **************************************************
## PortfolioAnalytics Optimization with Rebalancing
## **************************************************
##
## Call:
## optimize.portfolio.rebalancing(R = portfolioReturns, portfolio = portf,
## optimize_method = "random", rp = rp, rebalance_on = "months",
## training_period = 1, rolling_window = 4)
##
## Number of rebalancing dates: 25
## First rebalance date:
## [1] "2020-06-30"
## Last rebalance date:
## [1] "2022-06-17"
##
## Annualized Portfolio Rebalancing Return:
## [1] 0.1086059
##
## Annualized Portfolio Standard Deviation:
## [1] 0.2775822
資產重新平衡次數:25次
投資組合年化報酬:10.86%
投資組合年化標準差:27.76%
chart.Weights(opt_rebal, main = "Rebalanced Weights Over Time")
1.維持個股等比例的投資組合
2.S&P500
3.移動窗格法最佳化投資組合
# 等比例之投資組合報酬
equal_weight <- rep(1/ncol(portfolioReturns),ncol(portfolioReturns))
benchmark <- Return.portfolio(portfolioReturns,weights = equal_weight)
colnames(benchmark) <- "Benchmark Portfolio"
# S&P500報酬
sp500prices <- getSymbols.yahoo("SPY", from = "2020-06-19",to = "2022-06-19", periodicity = "daily", auto.assign = F)[,4]
sp500Rets <- na.omit(ROC(sp500prices))
sp500Rets <- as.xts(sp500Rets)
# 移動窗格法最佳化投資組合之報酬
rebal_weights <- extractWeights(opt_rebal)
rebal_returns <- Return.portfolio(portfolioReturns, weights = rebal_weights)
rets_df <- cbind(rebal_returns, benchmark, sp500Rets)
charts.PerformanceSummary(rets_df, main = "P/L Over Time")
可以看出我們的投資組合的累積報酬(黑色線)基本上都是勝過追蹤S&P500之ETF SPY(綠色線)的
雖然在獲利的時候未能勝過等比例的篩選出股票的投資組合(紅色線)
但在大盤開始下跌的時候,我們的投資組合較為抗跌
也因此可以看到在2021年末我們的投資組合成為了三種資產中報酬最好的選擇
可以推論出以下兩點
我們的選股策略有效 (紅線與黑線一直都在綠線之上)
我們的移動窗格最佳化有效,讓我們在下跌的時候起到更好的抗跌作用
透過這份研究,這樣的選股策略以及資產配置的調整方法或許可以運用在我們日常的基金管理。
rets_df <- cbind(rebal_returns, benchmark, sp500Rets)
charts.PerformanceSummary(rets_df, main = "P/L Over Time")
此處程式碼為撰寫此份專案的工作R檔,因此排版方面未能做到最好,
但還是希望附上給有興趣的人參考。
library(rvest)
library(forcats)
library(plotly)
library(stringr)
library(purrr)
library(seminr)
# 架構圖
model <- relationships(
paths(from = "SP500",to = c("Information Technology","Industrials","Financials","Health Care")),
paths(from = "Information Technology", to = multi_items("Information_Technology_",1:2)),
paths(from = "Industrials", to = multi_items("Industrials_",1:2)),
paths(from = "Financials", to = multi_items("Financials_",1:2)),
paths(from = "Health Care", to = multi_items("Health_Care_",1:2)),
paths(from = multi_items("Information_Technology_",1:2),to ="Portfolio"),
paths(from = multi_items("Industrials_",1:2), to = "Portfolio"),
paths(from = multi_items("Financials_",1:2), to ="Portfolio"),
paths(from = multi_items("Health_Care_",1:2), to ="Portfolio"),
paths(from = "Portfolio",to = "Performance Analytics")
)
plot(model)
# Web-scrape SP500 stock list
sp_500 <- read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies") %>%
html_node("table.wikitable") %>%
html_table() %>%
select("Symbol", Security, "GICS Sector", "GICS Sub-Industry") %>%
as_tibble()
# Format names
names(sp_500) <- sp_500 %>%
names() %>%
make.names()
# Show results
sp_500
# show in condensed format
sp_500 %>%
lapply(function(x) x %>% unique() %>% length()) %>%
unlist()
#Visualiztion industry in SP500
sp_500 %>%
# Summarise data by frequency
group_by(GICS.Sector) %>%
summarise(count = n()) %>%
# Visualize
ggplot(aes(x = GICS.Sector %>% fct_reorder(count),
y = count
)) +
geom_bar(stat = "identity") +
geom_text(aes(label = count), size = 3, nudge_y = 4, nudge_x = .1) +
scale_y_continuous(limits = c(0,100)) +
ggtitle(label = "Sector Frequency Among SP500 Stocks") +
xlab(label = "GICS Sector") +
theme(plot.title = element_text(size = 16)) +
coord_flip()
# 選取前五大相關產業
sp_500_Information_Technology <- sp_500 %>%
filter(GICS.Sector=="Information Technology")
sp_500_Industrials <- sp_500 %>%
filter(GICS.Sector=="Industrials")
sp_500_Financials <- sp_500 %>%
filter(GICS.Sector=="Financials")
sp_500_Health_Care <- sp_500 %>%
filter(GICS.Sector=="Health Care")
library(quantmod)
library(tibble)
# Get Stock Prices Function
get_stock_prices <- function(ticker, return_format = "tibble", ...) {
# Get stock prices
stock_prices_xts <- getSymbols(Symbols = ticker, auto.assign = FALSE, ...)
Time <- index(stock_prices_xts)
# Rename
names(stock_prices_xts) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")
# Return in xts format if tibble is not specified
if (return_format == "tibble") {
stock_prices <- stock_prices_xts %>%
as_tibble() %>%
rownames_to_column(var = "Date") %>%
mutate(Date = Time)
} else {
stock_prices <- stock_prices_xts
}
stock_prices
}
# Get Log Returns Function
get_log_returns <- function(x, return_format = "tibble", period = 'daily', ...) {
Time <- x$Date
# Convert tibble to xts
if (!is.xts(x)) {
x <- xts(x[,-1], order.by = x$Date)
}
# Get log returns
log_returns_xts <- periodReturn(x = x$Adjusted, type = 'log', period = period, ...)
# Rename
names(log_returns_xts) <- "Log.Returns"
# Return in xts format if tibble is not specified
if (return_format == "tibble") {
log_returns <- log_returns_xts %>%
as_tibble() %>%
rownames_to_column(var = "Date") %>%
mutate(Date = Time)
} else {
log_returns <- log_returns_xts
}
log_returns
}
# Mapping the Functions to Information Technology Company
sp_500_Information_Technology <- sp_500_Information_Technology %>%
mutate(
stock.prices = map(Symbol,
function(.x) get_stock_prices(.x,
return_format = "tibble",
from = "2018-06-19",
to = "2020-06-19")
),
log.returns = map(stock.prices,
function(.x) get_log_returns(.x, return_format = "tibble")),
mean.log.returns = map_dbl(log.returns, ~ mean(.$Log.Returns)),
sd.log.returns = map_dbl(log.returns, ~ sd(.$Log.Returns)),
n.trade.days = map_dbl(stock.prices, nrow)
)
# 觀察資料
sp_500_Information_Technology%>%
select(Symbol, stock.prices:log.returns)
# 其中的第一個資料裡的stock.prices的tibble(ACN)
sp_500_Information_Technology$stock.prices[[1]]
# 選取我們需要使用的資料
sp_500_Information_Technology %>%
select(Symbol, mean.log.returns:n.trade.days)
# Visualize S&P500_Information_Technology Analysis: Stock Risk vs Reward
plot_ly(data = sp_500_Information_Technology,
type = "scatter",
mode = "markers",
x = ~ sd.log.returns,
y = ~ mean.log.returns,
color = ~ n.trade.days,
colors = "Blues",
size = ~ n.trade.days,
text = ~ str_c("<em>", Security, "</em><br>",
"Ticker: ", Symbol, "<br>",
"Sub Sector: ", GICS.Sub.Industry, "<br>",
"No. of Trading Days: ", n.trade.days),
marker = list(opacity = 0.8,
symbol = 'circle',
sizemode = 'diameter',
sizeref = 4.0,
line = list(width = 2, color = '#FFFFFF'))
) %>%
layout(title = 'S&P500_Information_Technology Analysis: Stock Risk vs Reward',
xaxis = list(title = 'Risk/Variability (StDev Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'Reward/Growth (Mean Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
margin = list(l = 100,
t = 100,
b = 100),
font = list(color = '#FFFFFF'),
paper_bgcolor = 'rgb(0, 0, 0)',
plot_bgcolor = 'rgb(0, 0, 0)')
# 篩選標的
# METHOD1 : sd.log.returns < 0.03 and Highest mean.log.returns
sp_500_Information_Technology_picked_1 <- sp_500_Information_Technology %>%
filter(mean.log.returns >= 0.001,
sd.log.returns < 0.03) %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(mean.log.returns %>% desc())
sp_500_Information_Technology_picked_1[1,]
# METHOD2 : Highest sd.log.returns
sp_500_Information_Technology_picked_2 <- sp_500_Information_Technology %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(sd.log.returns %>% desc())
sp_500_Information_Technology_picked_2[1,]
# 挑選標的
# NOW
as.character(sp_500_Information_Technology_picked_1[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Information_Technology_picked_1[1,1]))
# ENPH
as.character(sp_500_Information_Technology_picked_2[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Information_Technology_picked_2[1,1]))
# Mapping the Functions to Industrials Company
sp_500_Industrials <- sp_500_Industrials %>%
mutate(
stock.prices = map(Symbol,
function(.x) get_stock_prices(.x,
return_format = "tibble",
from = "2018-06-19",
to = "2020-06-19")
),
log.returns = map(stock.prices,
function(.x) get_log_returns(.x, return_format = "tibble")),
mean.log.returns = map_dbl(log.returns, ~ mean(.$Log.Returns)),
sd.log.returns = map_dbl(log.returns, ~ sd(.$Log.Returns)),
n.trade.days = map_dbl(stock.prices, nrow)
)
# Visualize S&P500_Industrials Analysis: Stock Risk vs Reward
plot_ly(data = sp_500_Industrials,
type = "scatter",
mode = "markers",
x = ~ sd.log.returns,
y = ~ mean.log.returns,
color = ~ n.trade.days,
colors = "Blues",
size = ~ n.trade.days,
text = ~ str_c("<em>", Security, "</em><br>",
"Ticker: ", Symbol, "<br>",
"Sub Sector: ", GICS.Sub.Industry, "<br>",
"No. of Trading Days: ", n.trade.days),
marker = list(opacity = 0.8,
symbol = 'circle',
sizemode = 'diameter',
sizeref = 4.0,
line = list(width = 2, color = '#FFFFFF'))
) %>%
layout(title = 'S&P500_Industrials Analysis: Stock Risk vs Reward',
xaxis = list(title = 'Risk/Variability (StDev Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'Reward/Growth (Mean Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
margin = list(l = 100,
t = 100,
b = 100),
font = list(color = '#FFFFFF'),
paper_bgcolor = 'rgb(0, 0, 0)',
plot_bgcolor = 'rgb(0, 0, 0)')
# 篩選標的
# METHOD1 : sd.log.returns < 0.03 and Highest mean.log.returns
sp_500_Industrials_picked_1 <- sp_500_Industrials %>%
filter(mean.log.returns >= 0.001,
sd.log.returns < 0.03) %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(mean.log.returns %>% desc())
sp_500_Industrials_picked_1
sp_500_Industrials_picked_1[1,]
# METHOD2 : Highest sd.log.returns
sp_500_Industrials_picked_2 <- sp_500_Industrials %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(sd.log.returns %>% desc())
sp_500_Industrials_picked_2
sp_500_Industrials_picked_2[1,]
# 挑選標的
# GNRC
as.character(sp_500_Industrials_picked_1[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Industrials_picked_1[1,1]))
# CARR
as.character(sp_500_Industrials_picked_2[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Industrials_picked_2[1,1]))
# 把警告的資料排除問題
sp_500_Health_Care <- sp_500_Health_Care %>%
filter(Symbol!= c("OGN"))
# Mapping the Functions to Health Care Company
sp_500_Health_Care <- sp_500_Health_Care %>%
mutate(
stock.prices = map(Symbol,
function(.x) get_stock_prices(.x,
return_format = "tibble",
from = "2018-06-19",
to = "2020-06-19")
),
log.returns = map(stock.prices,
function(.x) get_log_returns(.x, return_format = "tibble")),
mean.log.returns = map_dbl(log.returns, ~ mean(.$Log.Returns)),
sd.log.returns = map_dbl(log.returns, ~ sd(.$Log.Returns)),
n.trade.days = map_dbl(stock.prices, nrow)
)
# Visualize S&P500_Health_Care Analysis: Stock Risk vs Reward
plot_ly(data = sp_500_Health_Care,
type = "scatter",
mode = "markers",
x = ~ sd.log.returns,
y = ~ mean.log.returns,
color = ~ n.trade.days,
colors = "Blues",
size = ~ n.trade.days,
text = ~ str_c("<em>", Security, "</em><br>",
"Ticker: ", Symbol, "<br>",
"Sub Sector: ", GICS.Sub.Industry, "<br>",
"No. of Trading Days: ", n.trade.days),
marker = list(opacity = 0.8,
symbol = 'circle',
sizemode = 'diameter',
sizeref = 4.0,
line = list(width = 2, color = '#FFFFFF'))
) %>%
layout(title = 'S&P500_Health_Care Analysis: Stock Risk vs Reward',
xaxis = list(title = 'Risk/Variability (StDev Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'Reward/Growth (Mean Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
margin = list(l = 100,
t = 100,
b = 100),
font = list(color = '#FFFFFF'),
paper_bgcolor = 'rgb(0, 0, 0)',
plot_bgcolor = 'rgb(0, 0, 0)')
# 篩選標的
# METHOD1 : sd.log.returns < 0.03 and Highest mean.log.returns
sp_500_Health_Care_picked_1 <- sp_500_Health_Care %>%
filter(mean.log.returns >= 0.001,
sd.log.returns < 0.03) %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(mean.log.returns %>% desc())
sp_500_Health_Care_picked_1
sp_500_Health_Care_picked_1[1,]
# METHOD2 : Highest sd.log.returns
sp_500_Health_Care_picked_2 <- sp_500_Health_Care %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(sd.log.returns %>% desc())
sp_500_Health_Care_picked_2
sp_500_Health_Care_picked_2[1,]
# 挑選標的
# WST
as.character(sp_500_Health_Care_picked_1[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Health_Care_picked_1[1,1]))
# MRNA
as.character(sp_500_Health_Care_picked_2[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Health_Care_picked_2[1,1]))
# 把警告的資料排除問題
sp_500_Financials <- sp_500_Financials %>%
filter(Symbol!= c("BRK.B"))
# Mapping the Functions to Financials Company
sp_500_Financials <- sp_500_Financials %>%
mutate(
stock.prices = map(Symbol,
function(.x) get_stock_prices(.x,
return_format = "tibble",
from = "2018-06-19",
to = "2020-06-19")
),
log.returns = map(stock.prices,
function(.x) get_log_returns(.x, return_format = "tibble")),
mean.log.returns = map_dbl(log.returns, ~ mean(.$Log.Returns)),
sd.log.returns = map_dbl(log.returns, ~ sd(.$Log.Returns)),
n.trade.days = map_dbl(stock.prices, nrow)
)
# Visualize S&P500_Financials Analysis: Stock Risk vs Reward
plot_ly(data = sp_500_Financials,
type = "scatter",
mode = "markers",
x = ~ sd.log.returns,
y = ~ mean.log.returns,
color = ~ n.trade.days,
colors = "Blues",
size = ~ n.trade.days,
text = ~ str_c("<em>", Security, "</em><br>",
"Ticker: ", Symbol, "<br>",
"Sub Sector: ", GICS.Sub.Industry, "<br>",
"No. of Trading Days: ", n.trade.days),
marker = list(opacity = 0.8,
symbol = 'circle',
sizemode = 'diameter',
sizeref = 4.0,
line = list(width = 2, color = '#FFFFFF'))
) %>%
layout(title = 'S&P500_Financials Analysis: Stock Risk vs Reward',
xaxis = list(title = 'Risk/Variability (StDev Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'Reward/Growth (Mean Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
margin = list(l = 100,
t = 100,
b = 100),
font = list(color = '#FFFFFF'),
paper_bgcolor = 'rgb(0, 0, 0)',
plot_bgcolor = 'rgb(0, 0, 0)')
# 篩選標的
# METHOD1 : sd.log.returns < 0.03 and Highest mean.log.returns
sp_500_Financials_picked_1 <- sp_500_Financials %>%
filter(mean.log.returns >= 0.001,
sd.log.returns < 0.03) %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(mean.log.returns %>% desc())
sp_500_Financials_picked_1
sp_500_Financials_picked_1[1,]
# METHOD2 : Highest sd.log.returns
sp_500_Financials_picked_2 <- sp_500_Financials %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(sd.log.returns %>% desc())
sp_500_Financials_picked_2
sp_500_Financials_picked_2[1,]
# 挑選標的
# MKTX
as.character(sp_500_Financials_picked_1[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Financials_picked_1[1,1]))
# LNC
as.character(sp_500_Financials_picked_2[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Financials_picked_2[1,1]))
# 最終篩選結果
results <-c(sp_500_Information_Technology_picked_1[1,1],sp_500_Information_Technology_picked_2[1,1],
sp_500_Industrials_picked_1[1,1],sp_500_Industrials_picked_2[1,1],
sp_500_Financials_picked_1[1,1],sp_500_Financials_picked_2[1,1],
sp_500_Health_Care_picked_1[1,1],sp_500_Health_Care_picked_2[1,1]) %>%
as.character()
results
library(rvest)
library(forcats)
library(plotly)
library(stringr)
library(purrr)
library(seminr)
# 架構圖
model <- relationships(
paths(from = "SP500",to = c("Information Technology","Industrials","Financials","Health Care")),
paths(from = "Information Technology", to = multi_items("Information_Technology_",1:2)),
paths(from = "Industrials", to = multi_items("Industrials_",1:2)),
paths(from = "Financials", to = multi_items("Financials_",1:2)),
paths(from = "Health Care", to = multi_items("Health_Care_",1:2)),
paths(from = multi_items("Information_Technology_",1:2),to ="Portfolio"),
paths(from = multi_items("Industrials_",1:2), to = "Portfolio"),
paths(from = multi_items("Financials_",1:2), to ="Portfolio"),
paths(from = multi_items("Health_Care_",1:2), to ="Portfolio"),
paths(from = "Portfolio",to = "Performance Analytics")
)
plot(model)
# Web-scrape SP500 stock list
sp_500 <- read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies") %>%
html_node("table.wikitable") %>%
html_table() %>%
select("Symbol", Security, "GICS Sector", "GICS Sub-Industry") %>%
as_tibble()
# Format names
names(sp_500) <- sp_500 %>%
names() %>%
make.names()
# Show results
sp_500
# show in condensed format
sp_500 %>%
lapply(function(x) x %>% unique() %>% length()) %>%
unlist()
#Visualiztion industry in SP500
sp_500 %>%
# Summarise data by frequency
group_by(GICS.Sector) %>%
summarise(count = n()) %>%
# Visualize
ggplot(aes(x = GICS.Sector %>% fct_reorder(count),
y = count
)) +
geom_bar(stat = "identity") +
geom_text(aes(label = count), size = 3, nudge_y = 4, nudge_x = .1) +
scale_y_continuous(limits = c(0,100)) +
ggtitle(label = "Sector Frequency Among SP500 Stocks") +
xlab(label = "GICS Sector") +
theme(plot.title = element_text(size = 16)) +
coord_flip()
# 選取前五大相關產業
sp_500_Information_Technology <- sp_500 %>%
filter(GICS.Sector=="Information Technology")
sp_500_Industrials <- sp_500 %>%
filter(GICS.Sector=="Industrials")
sp_500_Financials <- sp_500 %>%
filter(GICS.Sector=="Financials")
sp_500_Health_Care <- sp_500 %>%
filter(GICS.Sector=="Health Care")
library(quantmod)
library(tibble)
# Get Stock Prices Function
get_stock_prices <- function(ticker, return_format = "tibble", ...) {
# Get stock prices
stock_prices_xts <- getSymbols(Symbols = ticker, auto.assign = FALSE, ...)
Time <- index(stock_prices_xts)
# Rename
names(stock_prices_xts) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")
# Return in xts format if tibble is not specified
if (return_format == "tibble") {
stock_prices <- stock_prices_xts %>%
as_tibble() %>%
rownames_to_column(var = "Date") %>%
mutate(Date = Time)
} else {
stock_prices <- stock_prices_xts
}
stock_prices
}
# Get Log Returns Function
get_log_returns <- function(x, return_format = "tibble", period = 'daily', ...) {
Time <- x$Date
# Convert tibble to xts
if (!is.xts(x)) {
x <- xts(x[,-1], order.by = x$Date)
}
# Get log returns
log_returns_xts <- periodReturn(x = x$Adjusted, type = 'log', period = period, ...)
# Rename
names(log_returns_xts) <- "Log.Returns"
# Return in xts format if tibble is not specified
if (return_format == "tibble") {
log_returns <- log_returns_xts %>%
as_tibble() %>%
rownames_to_column(var = "Date") %>%
mutate(Date = Time)
} else {
log_returns <- log_returns_xts
}
log_returns
}
# Mapping the Functions to Information Technology Company
sp_500_Information_Technology <- sp_500_Information_Technology %>%
mutate(
stock.prices = map(Symbol,
function(.x) get_stock_prices(.x,
return_format = "tibble",
from = "2018-06-19",
to = "2020-06-19")
),
log.returns = map(stock.prices,
function(.x) get_log_returns(.x, return_format = "tibble")),
mean.log.returns = map_dbl(log.returns, ~ mean(.$Log.Returns)),
sd.log.returns = map_dbl(log.returns, ~ sd(.$Log.Returns)),
n.trade.days = map_dbl(stock.prices, nrow)
)
# 觀察資料
sp_500_Information_Technology%>%
select(Symbol, stock.prices:log.returns)
# 其中的第一個資料裡的stock.prices的tibble(ACN)
sp_500_Information_Technology$stock.prices[[1]]
# 選取我們需要使用的資料
sp_500_Information_Technology %>%
select(Symbol, mean.log.returns:n.trade.days)
# Visualize S&P500_Information_Technology Analysis: Stock Risk vs Reward
plot_ly(data = sp_500_Information_Technology,
type = "scatter",
mode = "markers",
x = ~ sd.log.returns,
y = ~ mean.log.returns,
color = ~ n.trade.days,
colors = "Blues",
size = ~ n.trade.days,
text = ~ str_c("<em>", Security, "</em><br>",
"Ticker: ", Symbol, "<br>",
"Sub Sector: ", GICS.Sub.Industry, "<br>",
"No. of Trading Days: ", n.trade.days),
marker = list(opacity = 0.8,
symbol = 'circle',
sizemode = 'diameter',
sizeref = 4.0,
line = list(width = 2, color = '#FFFFFF'))
) %>%
layout(title = 'S&P500_Information_Technology Analysis: Stock Risk vs Reward',
xaxis = list(title = 'Risk/Variability (StDev Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'Reward/Growth (Mean Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
margin = list(l = 100,
t = 100,
b = 100),
font = list(color = '#FFFFFF'),
paper_bgcolor = 'rgb(0, 0, 0)',
plot_bgcolor = 'rgb(0, 0, 0)')
# 篩選標的
# METHOD1 : sd.log.returns < 0.03 and Highest mean.log.returns
sp_500_Information_Technology_picked_1 <- sp_500_Information_Technology %>%
filter(mean.log.returns >= 0.001,
sd.log.returns < 0.03) %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(mean.log.returns %>% desc())
sp_500_Information_Technology_picked_1[1,]
# METHOD2 : Highest sd.log.returns
sp_500_Information_Technology_picked_2 <- sp_500_Information_Technology %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(sd.log.returns %>% desc())
sp_500_Information_Technology_picked_2[1,]
# 挑選標的
# NOW
as.character(sp_500_Information_Technology_picked_1[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Information_Technology_picked_1[1,1]))
# ENPH
as.character(sp_500_Information_Technology_picked_2[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Information_Technology_picked_2[1,1]))
# Mapping the Functions to Industrials Company
sp_500_Industrials <- sp_500_Industrials %>%
mutate(
stock.prices = map(Symbol,
function(.x) get_stock_prices(.x,
return_format = "tibble",
from = "2018-06-19",
to = "2020-06-19")
),
log.returns = map(stock.prices,
function(.x) get_log_returns(.x, return_format = "tibble")),
mean.log.returns = map_dbl(log.returns, ~ mean(.$Log.Returns)),
sd.log.returns = map_dbl(log.returns, ~ sd(.$Log.Returns)),
n.trade.days = map_dbl(stock.prices, nrow)
)
# Visualize S&P500_Industrials Analysis: Stock Risk vs Reward
plot_ly(data = sp_500_Industrials,
type = "scatter",
mode = "markers",
x = ~ sd.log.returns,
y = ~ mean.log.returns,
color = ~ n.trade.days,
colors = "Blues",
size = ~ n.trade.days,
text = ~ str_c("<em>", Security, "</em><br>",
"Ticker: ", Symbol, "<br>",
"Sub Sector: ", GICS.Sub.Industry, "<br>",
"No. of Trading Days: ", n.trade.days),
marker = list(opacity = 0.8,
symbol = 'circle',
sizemode = 'diameter',
sizeref = 4.0,
line = list(width = 2, color = '#FFFFFF'))
) %>%
layout(title = 'S&P500_Industrials Analysis: Stock Risk vs Reward',
xaxis = list(title = 'Risk/Variability (StDev Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'Reward/Growth (Mean Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
margin = list(l = 100,
t = 100,
b = 100),
font = list(color = '#FFFFFF'),
paper_bgcolor = 'rgb(0, 0, 0)',
plot_bgcolor = 'rgb(0, 0, 0)')
# 篩選標的
# METHOD1 : sd.log.returns < 0.03 and Highest mean.log.returns
sp_500_Industrials_picked_1 <- sp_500_Industrials %>%
filter(mean.log.returns >= 0.001,
sd.log.returns < 0.03) %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(mean.log.returns %>% desc())
sp_500_Industrials_picked_1
sp_500_Industrials_picked_1[1,]
# METHOD2 : Highest sd.log.returns
sp_500_Industrials_picked_2 <- sp_500_Industrials %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(sd.log.returns %>% desc())
sp_500_Industrials_picked_2
sp_500_Industrials_picked_2[1,]
# 挑選標的
# GNRC
as.character(sp_500_Industrials_picked_1[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Industrials_picked_1[1,1]))
# CARR
as.character(sp_500_Industrials_picked_2[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Industrials_picked_2[1,1]))
# 把警告的資料排除問題
sp_500_Health_Care <- sp_500_Health_Care %>%
filter(Symbol!= c("OGN"))
# Mapping the Functions to Health Care Company
sp_500_Health_Care <- sp_500_Health_Care %>%
mutate(
stock.prices = map(Symbol,
function(.x) get_stock_prices(.x,
return_format = "tibble",
from = "2018-06-19",
to = "2020-06-19")
),
log.returns = map(stock.prices,
function(.x) get_log_returns(.x, return_format = "tibble")),
mean.log.returns = map_dbl(log.returns, ~ mean(.$Log.Returns)),
sd.log.returns = map_dbl(log.returns, ~ sd(.$Log.Returns)),
n.trade.days = map_dbl(stock.prices, nrow)
)
# Visualize S&P500_Health_Care Analysis: Stock Risk vs Reward
plot_ly(data = sp_500_Health_Care,
type = "scatter",
mode = "markers",
x = ~ sd.log.returns,
y = ~ mean.log.returns,
color = ~ n.trade.days,
colors = "Blues",
size = ~ n.trade.days,
text = ~ str_c("<em>", Security, "</em><br>",
"Ticker: ", Symbol, "<br>",
"Sub Sector: ", GICS.Sub.Industry, "<br>",
"No. of Trading Days: ", n.trade.days),
marker = list(opacity = 0.8,
symbol = 'circle',
sizemode = 'diameter',
sizeref = 4.0,
line = list(width = 2, color = '#FFFFFF'))
) %>%
layout(title = 'S&P500_Health_Care Analysis: Stock Risk vs Reward',
xaxis = list(title = 'Risk/Variability (StDev Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'Reward/Growth (Mean Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
margin = list(l = 100,
t = 100,
b = 100),
font = list(color = '#FFFFFF'),
paper_bgcolor = 'rgb(0, 0, 0)',
plot_bgcolor = 'rgb(0, 0, 0)')
# 篩選標的
# METHOD1 : sd.log.returns < 0.03 and Highest mean.log.returns
sp_500_Health_Care_picked_1 <- sp_500_Health_Care %>%
filter(mean.log.returns >= 0.001,
sd.log.returns < 0.03) %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(mean.log.returns %>% desc())
sp_500_Health_Care_picked_1
sp_500_Health_Care_picked_1[1,]
# METHOD2 : Highest sd.log.returns
sp_500_Health_Care_picked_2 <- sp_500_Health_Care %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(sd.log.returns %>% desc())
sp_500_Health_Care_picked_2
sp_500_Health_Care_picked_2[1,]
# 挑選標的
# WST
as.character(sp_500_Health_Care_picked_1[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Health_Care_picked_1[1,1]))
# MRNA
as.character(sp_500_Health_Care_picked_2[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Health_Care_picked_2[1,1]))
# 把警告的資料排除問題
sp_500_Financials <- sp_500_Financials %>%
filter(Symbol!= c("BRK.B"))
# Mapping the Functions to Financials Company
sp_500_Financials <- sp_500_Financials %>%
mutate(
stock.prices = map(Symbol,
function(.x) get_stock_prices(.x,
return_format = "tibble",
from = "2018-06-19",
to = "2020-06-19")
),
log.returns = map(stock.prices,
function(.x) get_log_returns(.x, return_format = "tibble")),
mean.log.returns = map_dbl(log.returns, ~ mean(.$Log.Returns)),
sd.log.returns = map_dbl(log.returns, ~ sd(.$Log.Returns)),
n.trade.days = map_dbl(stock.prices, nrow)
)
# Visualize S&P500_Financials Analysis: Stock Risk vs Reward
plot_ly(data = sp_500_Financials,
type = "scatter",
mode = "markers",
x = ~ sd.log.returns,
y = ~ mean.log.returns,
color = ~ n.trade.days,
colors = "Blues",
size = ~ n.trade.days,
text = ~ str_c("<em>", Security, "</em><br>",
"Ticker: ", Symbol, "<br>",
"Sub Sector: ", GICS.Sub.Industry, "<br>",
"No. of Trading Days: ", n.trade.days),
marker = list(opacity = 0.8,
symbol = 'circle',
sizemode = 'diameter',
sizeref = 4.0,
line = list(width = 2, color = '#FFFFFF'))
) %>%
layout(title = 'S&P500_Financials Analysis: Stock Risk vs Reward',
xaxis = list(title = 'Risk/Variability (StDev Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'Reward/Growth (Mean Log Returns)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
margin = list(l = 100,
t = 100,
b = 100),
font = list(color = '#FFFFFF'),
paper_bgcolor = 'rgb(0, 0, 0)',
plot_bgcolor = 'rgb(0, 0, 0)')
# 篩選標的
# METHOD1 : sd.log.returns < 0.03 and Highest mean.log.returns
sp_500_Financials_picked_1 <- sp_500_Financials %>%
filter(mean.log.returns >= 0.001,
sd.log.returns < 0.03) %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(mean.log.returns %>% desc())
sp_500_Financials_picked_1
sp_500_Financials_picked_1[1,]
# METHOD2 : Highest sd.log.returns
sp_500_Financials_picked_2 <- sp_500_Financials %>%
select(Symbol, mean.log.returns:n.trade.days) %>%
arrange(sd.log.returns %>% desc())
sp_500_Financials_picked_2
sp_500_Financials_picked_2[1,]
# 挑選標的
# MKTX
as.character(sp_500_Financials_picked_1[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Financials_picked_1[1,1]))
# LNC
as.character(sp_500_Financials_picked_2[1,1]) %>%
getSymbols(auto.assign = FALSE,
from = "2018-06-19",
to = "2020-06-19") %>%
chartSeries(name = as.character(sp_500_Financials_picked_2[1,1]))
# 最終篩選結果
results <-c(sp_500_Information_Technology_picked_1[1,1],sp_500_Information_Technology_picked_2[1,1],
sp_500_Industrials_picked_1[1,1],sp_500_Industrials_picked_2[1,1],
sp_500_Financials_picked_1[1,1],sp_500_Financials_picked_2[1,1],
sp_500_Health_Care_picked_1[1,1],sp_500_Health_Care_picked_2[1,1]) %>%
as.character()
results
library(PortfolioAnalytics)
library(PerformanceAnalytics)
tickers <- results
portfolioPrices <- NULL
for (ticker in tickers) {
portfolioPrices <- cbind(portfolioPrices,
getSymbols.yahoo(ticker, from = "2020-06-19",to = "2022-06-19", periodicity = "daily", auto.assign = F)[,4])
}
portfolioReturns <- na.omit(ROC(portfolioPrices))
portf <- portfolio.spec(colnames(portfolioReturns))
portf <- add.constraint(portf, type = 'weight_sum', min_sum = 0.99, max_sum = 1.01)
portf <- add.constraint(portf, type = "transaction_cost", ptc = 0.0075)
portf <- add.constraint(portf, type = 'box', min = .05, max = .20)
portf <- add.objective(portf, type = 'return', name = "mean")
portf <- add.objective(portf, type = 'risk', name = "StdDev", target = 0.005)
rp <- random_portfolios(portf, 10000, "sample")
opt_rebal <- optimize.portfolio.rebalancing(portfolioReturns,
portf,
optimize_method = "random",
rp = rp,
rebalance_on = "months",
training_period = 1,
rolling_window = 4)
opt_rebal
chart.Weights(opt_rebal, main = "Rebalanced Weights Over Time")
equal_weight <- rep(1/ncol(portfolioReturns),ncol(portfolioReturns))
benchmark <- Return.portfolio(portfolioReturns,weights = equal_weight)
colnames(benchmark) <- "Benchmark Portfolio"
sp500prices <- getSymbols.yahoo("SPY", from = "2020-06-19",to = "2022-06-19", periodicity = "daily", auto.assign = F)[,4]
sp500Rets <- na.omit(ROC(sp500prices))
sp500Rets <- as.xts(sp500Rets)
rebal_weights <- extractWeights(opt_rebal)
rebal_returns <- Return.portfolio(portfolioReturns, weights = rebal_weights)
rets_df <- cbind(rebal_returns, benchmark, sp500Rets)
charts.PerformanceSummary(rets_df, main = "P/L Over Time")
之前在投資學課堂上,有做一個模擬投資,那時的規則是要管理1,000萬台幣的資金,
那個作業後我意識到當管理資金規模變大時,便不能隨意地操作,因為手續費可能會成為很大的支出,
而在這過程中也發現,不論是基本面分析或技術分析,都逃離不了股價實際表現出來的樣態,
因此在製作此份專案時,便想到如果可以利用股價資料的表現,來作為選股標準,
並利用之前上過金融交易實務課堂學到的移動窗格化回測去測驗自製的投資組合績效,或許會是一個好主題!
選取S&P500作為比較對象是因為之前有讀過關於巴菲特十年之約的文章,
覺得如果能夠打敗巴菲特的想法或許是一個很有趣的方向,便產生了這份專案。
利用R語言實作網路爬蟲,對時間序列加以分析以及對資料進行分類,並繪製可視化圖表讓資料視覺化,
並實作了移動窗格最佳化回測,以及評估投資組合績效表現。
rvest
rvest 套件擷取任意的網頁資料,並將大量結構化的資料直接匯入 R 中,
讓資料分析者可以省去手動整理資料的時間。
DT
可將所有的數據呈現為交互式數據表,可自行選擇每次呈現數量的頻率為何。
purrr
可將資料整理成需要的形式
plotly
plotly 的 R 圖形庫可以製作交互式、互動式的圖形
在此處可以更方便讀者自行操作。
PortfolioAnalytics
此套件幫助我們設定投資組合限定與目標條件,
並可以自行設定參數實現移動窗格法最佳化。
PerformanceAnalytics
此套件幫助我們評估投資組合之績效,以及報酬率圖表可視化
讓我們可以有效分析策略是否有效。
上課所教的dplyr, ggplot2
上課教的這些套件,是我覺得最實用的,因為在整理資料的過程如果只用傳統命令變數的方法,程式碼會變得複雜無比,
而dplyr讓我可以更快速且精準的整理時間序列資料成所需形式,
至於ggplot2 讓我可以更好的呈現資料想要的視覺化。
網路爬蟲
在利用R進行線上爬取股價資料的過程中,需要耗費很久的時間且常會有阻擋的問題,
所以需要更換ip位置或是分批爬取資料導致分析過程很冗長。
資料自動化處理
撰寫爬取函數和log return函數時,因為這次處理的資料組成很複雜,
所以花了很多心思在思考如何整理資料成需要的形式
新函數的使用
因為想要實作移動窗格法的關係,花了很多時間研究 PortfolioAnalytics 套件,
以及搞懂需要資料的格式以及參數設置的背後理論基礎。
網站設計與專案呈現
為了讓使用者有更好的觀看體驗,嘗試了許多Rmarkdown的功能以及設計,
希望使用的各位都能很享受這份報告的視覺以及表達。
這學期花了很多時間在學習R語言的使用,從一開始薄弱的程式基礎到現在可以獨立撰寫一份專案,
特別感謝鍾經樊教授的課程指導以及專案建議,張雯晴助教的疑問解答的幫忙
最後在此放上本篇專案的參考資料,這些網路上的資源幫助我許多。
— 計財23劉冠宇